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A method and apparatus for encoding characteristics for the retrieval of information. Depending 
on the characteristics, some methods for retrieving information may be preferred. If information is too 
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is preferable to retrieve the information directly from the server instead of searching the cache first. A 
URL (Uniform Resource Locator) is utilized on the internet to specify the application protocol (e.g., 
http), the domain name (e.g., www.sun.com), and file location (e.g., /users/hcn/index.html). The suffix 
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One or more embodiments of the invention provide for encoding characteristics of data to be transferred 
that indicates or hints at an optimal method to retrieve the data. For example, the URL may specify that 
TCP is the preferred transfer protocol, thereby avoiding an attempted transfer using UDP. Additionally, 
the encoding may specify that the client should preferably retrieve the information directly from the 
server instead of searching the proxy cache. The characteristics or preferred retrieval method may be 
encoded in any portion of a URL. Additionally, one or more embodiments of the invention provide 
for backwards compatibility with existing internet browsers by encoding the characteristics in the file 
location portion of the URL instead of the application protocol identifier portion. 
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METHOD AND APPARATUS FOR ENCODING CONTENT 

CHARACTERISTICS 



BACKGROUND OF THE INVENTION 
1. FIELD OF THE INVENTION 



This invention relates to the field of computer software, and, more 
specifically, to optimizing network traffic. 

10 

Portions of the disclosure of this patent document contain material 
that is subject to copyright protection. The copyright owner has no objection 
to the facsimile reproduction by anyone of the patent document or the patent 
disclosure as it appears in the Patent and Trademark Office file or records, but 
15 otherwise reserves all copyright rights whatsoever. Sun, Sun Microsystems, 
the Sun logo, Solaris, Java, JavaOS, JavaStation, Hotjava Views and all Java- 
based trademarks and logos are trademarks or registered trademarks of Sun - 
Microsystems, Inc. in the United States and other countries. 

20 

2. BACKGROUND ART 

In a computer network environment, a computer user (client) may try 
to obtain a file from a central storage location (server). Existing schemes can 
25 waste time looking for the file and can often use an inefficient delivery 

method to provide the file to the user. These problems can be understood by 
reviewing networks and how they work. 
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A. Networks 

In modern computing environments, it is commonplace to employ 
multiple computers or workstations linked together in a network to 
5 communicate between, and share data with, network users. A network also 
may include resources, such as printers, modems, file servers, etc., and may 
also include services, such as electronic mail. Transferring information 
across a network may be a time consuming process. The prior art does not 
provide an efficient manner to optimize the transfer and retrieval of 
10 information on a network. 

A network can be a small system that is physically connected by cables 
or via wireless communication (a local area network or TAN"), or several 
separate networks can be connected together to form a larger network (a wide 
15 area network or "WAN"). Other types of networks include the internet, tel- 
com networks, the World Wide Web, intranets, extranets, wireless networks, 
and other networks over which electronic, digital, and/or analog data may be 
communicated. 

20 The Internet is a client/server system. A "client" is the computer that 

you use to access the Internet. When you log onto the World Wide Web 
portion of the Internet, you view "web pages" that are stored on a remote 
"server" computer. Information including data, files, and the web pages to be 
viewed are often transferred between the client and the server. Depending 

25 on the type of information transferred, the server or client may have to 

evaluate the information prior to processing. Additionally, one method for 
transferring the data may be more efficient than another method depending 
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on the type of data being transferred. Some background on the Internet helps 
provide an understanding of these problems. 

The Internet is a worldwide network of interconnected computers. An 
Internet client accesses a computer on the network via an Internet provider. 
An Internet provider is an organization that provides a client (e.g., an 
individual or other organization) with access to the Internet (via analog 
telephone line or Integrated Services Digital Network line, for example). A 
client can, for example, download a file from or send an electronic mail 
message to another computer /client using the Internet. An Intranet is an 
internal corporate or organizational network that uses many of the same 
communications protocols as the Internet. The terms Internet, World Wide 
Web (WWW), and Web as used herein includes the Intranet as well as the 
Internet. 

The components of the WWW include browser software, network 
links, and servers. The browser software, or browser, is a user-friendly 
interface (i.e., front-end) that simplifies access to the Internet. A browser 
allows a client to communicate a request without having to learn a 
complicated command syntax, for example. A browser typically provides a 
graphical user interface (GUI) for displaying information and receiving input. 
Examples of browsers currently available include Netscape Navigator and 
Internet Explorer. 

A browser displays information to a client or user as pages or 
documents. A language called Hypertext Markup Language (HTML) is used 
to define the format for a page to be displayed in the browser. A Web page is 
transmitted to a client as an HTML document. The browser executing at the 
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client parses the document and produces and displays a Web Page based on 
the information in the HTML document. Consequently, the HTML 
document defines the Web Page that is rendered at runtime on the browser. 

5 

B. Network Communication /Data Transfer 

Information servers maintain the information on the WWW and are 
capable of processing a client request. To enable the computers on a network 

10 including the WWW to communicate with each other, a set of standardized 
rules for exchanging the information between the computers, referred to as a 
"protocol", is utilized. Transfer Protocols generally specify the data format, 
timing, sequencing, and error checking of data transmissions. Numerous 
transfer protocols are used in the networking environment. For example, 

15 one family of transfer protocols is referred to as the transmission control 
protocol/internet protocol ("TCP/IP"). The TCP/IP family of transfer 
protocols is the set of transfer protocols used on the internet and on many 
multiplatform networks. 

20 

1. Transfer Protocols 

The TCP/IP transfer protocol family is made up of numerous 
individual protocols (e.g., file transfer protocol ("FTP"), transmission control 
25 protocol ("TCP"), and network terminal protocol ("TELNET")). The TCP 
protocol is responsible for breaking up a message to be transmitted into 
datagrams of manageable size, reassembling the datagrams at the receiving 
end, resending any datagrams that get lost (or are not transferred), and 
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reordering the data (from the datagrams) in the appropriate order. A 
datagram is a unit of data or information (also referred to as a packet) that is 
transferred or passed across the internet. A datagram contains a source and 
destination address along with the data. The TCP transfer protocol is often 
5 utilized to transmit large amounts of information because of its ability to 
break up the information into datagrams and reassemble the information at 
the receiving end. 

Another transfer protocol that is utilized to control the transfer of 
10 information is the user datagram protocol ("UDP"). UDP is designed for 
applications and data transmissions where sequences of datagrams do not 
need to be reassembled at the receiving end. UDP does not keep track of what 
has been transmitted in order to resend a datagram if necessary. Additionally, 
UDP's header information (information regarding the source and destination 
15 and other relevant information) is shorter than the header information 
utilized in TCP. 

2. Application Protocols 

20 

To utilize a Transfer Protocol to transfer information, an 
Application Protocol that defines a set of commands which one machine 
sends to another is utilized (e.g., commands to specify who the sender of the 
message is, who it is being sent to, and the text of the message). The Transfer 
25 Protocol (e.g., TCP or UDP) is utilized to ensure that the Application Protocol 
commands are completely transmitted to the receiving end. HyperText 
Transfer Protocol (HTTP) is the standard application protocol for 
communication with an information server on the WWW. HTTP has 
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communication methods that allow clients to request data from a server and 
send information to the server. 

To submit a request, the client contacts the HTTP server and transmits 
5 the request to the HTTP server. The request contains the communication 
method requested for the transaction (e.g., GET an object from the server or 
POST data to an object on the server). The HTTP server responds to the client 
by sending a status of the request and the requested information. The 
connection is then terminated between the client and the HTTP server. 

10 

A client request therefore, consists of establishing a connection 
between the client and the HTTP server, performing the request, and 
terminating the connection. The HTTP server does not need to maintain any 
state about the connection once it has been terminated. HTTP is, therefore, a 
15 stateless application protocol. That is, a client can make several requests of an 
HTTP server, but each individual request is treated independent of any other 
request. The server has no recollection of any previous request. The server 
does not need to retain state from a prior request. 

20 

3. Proxies 

Instead of transmitting the information from the server that 
maintains the information, some systems utilize what is referred to as a 
25 proxy. Referring to Figure 1, a proxy 102 is a server that carries out requests 
transmitted to it (i.e., from client 100), keeping copies of fetched documents or 
information for some time so that they can be accessed more quickly in the 
future, speeding up access for commonly requested information. This 
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maintaining of information and fetched documents by the proxy 102 is 
referred to as caching and the information maintained in the proxy 102 is 
referred to as a cache or proxy cache. 

5 A proxy 102 may be viewed as an intermediary between the server 104 

and client 100. Referring to Figure 1 and Figure 2, at step 202, the client 100 
requests information. At step 204, the request is forwarded to the proxy 102. 
At step 206, the client 100 first checks the proxy cache to see if the relevant 
information is maintained by the proxy 102. If the proxy cache contains the 

10 information, the client 100 does not need to contact the server 104 and loads 
the information directly from the proxy 102 at step 208. Alternatively, if the 
proxy cache does not contain the relevant information, the request is 
forwarded to the server 104 at step 210. At step 212, the client 100 retrieves the 
information from the server 104. When http is the protocol that is being 

15 transmitted over the internet protocol, the proxy 102 is referred to as a web 
proxy. 

To protect information in internal computer networks from external 
access, a firewall is utilized. A firewall is a mechanism that blocks access 

20 between the client and the server. To provide limited access to information, 
a proxy or proxy server may sit atop a firewall and act as a conduit, providing 
a specific connection for each network connection. Proxy software retains the 
ability to communicate with external sources, yet is trusted to communicate 
with the internal network. For example, proxy software may require a 

25 username and password to access certain sections of the internal network and 
completely block other sections from any external access. 
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C Addressing Scheme and Client/Server Data Retrieval 

An addressing scheme is employed to identify Internet resources (e.g., 
HTTP server, file or program). This addressing scheme is called Uniform 
Resource Locator (URL). A URL may contain the application protocol to use 
when accessing the server (e.g., HTTP), the Internet domain name (also 
referred to as the server host name) of the site on which the server is 
running, the port number of the server (the port number may not be 
specified in the URL but obtained by translating the server host name), and 
the location of the resource in the file structure of the server. For example, 
the URL "http://www.sunlabs.com/research/hsn/index.html" specifies the 
application protocol ("hrtp"), the server host name ("www.sunlabs.com"), 
and the filename to be retrieved ("/research/hsn/index.html"). 



15 If the client request is for a file, the HTTP server locates the file and 

sends it to the client. An HTTP server also has the ability to delegate work to 
Common Gateway Interface (CGI) programs. The CGI specification defines 
the mechanisms by which HTTP servers communicate with gateway 
programs. A gateway program is referenced using a URL. The HTTP server 

20 activates the program specified in the URL and uses CGI mechanisms to pass 
program data sent by the client to the gateway program. Data is passed from 
the server to the gateway program via command-line arguments, standard 
input, or environment variables. The gateway program processes the data, 
generates an HTML document, and returns the HTML document as its 

25 response to the server using CGI (via standard input, for example). The 
server forwards the HTML document to the client using the HTTP. 
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Once files have been retrieved, the client may utilize or process the file. 
For example, if a HTML document is retrieved, a client's web browser may 
parse the HTML document and display the document. Depending on the 
type of file retrieved, the client may activate an application to process the file. 
For example, if a word processing document is retrieved, the client may 
activate a word processor to process the document. Alternatively, if an image 
file is retrieved, an image viewer may be activated to process and display the 
image. 

To identify the type of file that is retrieved so that the client may know 
how to process the file subsequent to retrieval, a file suffix or extension may 
be utilized. A file extension or suffix often consists of a period and several 
letters that are attached to the end of a file name. For example, an HTML 
document may end with the suffix ".htm" or ".html" (e.g., "index.html" or 
"home.html"), a word processing document filename may end with the 
suffix ".doc" (e.g., "report.doc" or "letter.doc"), a JPEG Qoint Photographic 
Experts Group) image filename may end with the suffix ".jpg" (e.g., 
"image.jpg" or "picture.jpg"), and a postscript document (document created 
in the postscript page description language) may end with the suffix ".ps" 
(e.g., "calendar.ps" or "font.ps"). 

Upon receiving a file, the client browser will typically examine the 
extension to determine how to process the file after receipt (e.g., launch an 
application program to process the file). 

The above described methods are slow and inefficient in retrieving a 
file from the client or server. For example, with a web proxy, the client first 
searches the web proxy cache for the relevant web pages or information and if 
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not present then processes the request with the server requiring a second 
search. Additionally, although it may be more efficient to utilize a UDP 
transfer protocol instead of TCP (i.e., for smaller files), this is not done. 
Referring to Figure 3, the prior art first attempts to transfer information using 
UDP at step 302. At step 304, a determination is made regarding whether UDP 
was acceptable and was utilized to transmit the requested information. If the 
transfer attempt failed and UDP was not acceptable, the web browser will 
attempt the transfer using TCP at step 306. (Repetitively attempting a transfer 
with an improper transfer protocol creates a large overhead. Further, the 
prior art does not provide for an efficient method to optimize the retrieval of 
a file or information from a server or client). If the transfer using UDP is 
acceptable, the transfer proceeds using UDP at step 309. 
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SUMMARY OF THE INVENTION 

A method and apparatus for encoding characteristics for the retrieval 
of information. In a network of computers, clients and servers communicate 
5 and exchange data and information. Depending on characteristics of the data 
or information, certain methods for exchanging or retrieving the 
information may be preferred. For example, if the information is too large to 
utilize a UDP (User Datagram Protocol) transfer protocol, then the TCP 
(Transmission Control Protocol) transfer protocol may be preferred. In 
10 addition, if the information is not cacheable, then it is preferable to retrieve 
the information directly from the server instead of searching the cache first. 

A URL (Uniform Resource Locator) is utilized on the internet to 
specify the application protocol (e.g., http), the domain name (e.g., 
15 www.sun.com), and file location (e.g., /users/hcn/index.html). The suffix of 
a file indicator is utilized to identify how to process the data or information 
subsequent to retrieval. 

One or more embodiments of the invention provide for encoding 
20 characteristics of data to be transferred that indicates or hints at an optimal 
method to retrieve the data. For example, the URL may specify that TCP is 
the preferred transfer protocol, thereby avoiding an attempted transfer using 
UDP. Additionally, the encoding may specify that the data is not cacheable so 
that the client may retrieve the information directly from the server instead 
25 of searching the proxy cache. This encoded information may be published by 
the server and parsed by the client prior to executing data retrieval. Thus, the 
client may retrieve the data efficiently, decreasing the overhead utilized 
influencing the behavior of the data transfer over a network. 
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The characteristics or preferred retrieval method may be encoded in a 
any portion of a URL. Additionally, one or more embodiments of the 
invention provide for backwards compatibility with existing internet 
5 browsers by encoding the characteristics in the file location portion of the 
URL instead of the application protocol identifier portion. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the client, server, and proxy relationship. 

5 Figure 2 demonstrates a prior art method for retrieving information 

with a proxy. 

Figure 3 demonstrates a prior art method for retrieving information 
using TCP or UDP. 

10 

Figure 4 is a block diagram of one embodiment of a computer system 
capable of providing a suitable execution environment for one or more 
embodiments of the invention. 

15 Figure 5 demonstrates a method for the encoding and use of 

information in URLs according to one or more embodiments of the 
invention. 
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DETAILED DESCRIPTION OF THF INVENTION 

The invention is a method and apparatus for encoding content 
characteristics. In the following description, numerous specific details are set 
forth to provide a more thorough description of embodiments of the 
invention. It is apparent, however, to one skilled in the art, that the 
invention may be practiced without these specific details. In other instances, 
well known features have not been described in detail so as not to obscure the 
invention. 

Embodiment of Computer Execution Environment fHardwarp) 

An embodiment of the invention can be implemented as computer 
software in the form of computer readable code executed on a general 
purpose computer such as computer 400 illustrated in Figure 4, or in the form 
of bytecode class files running on such a computer. A keyboard 410 and 
mouse 411 are coupled to a bi-directional system bus 418. The keyboard and 
mouse are for introducing user input to the computer system and 
communicating that user input to processor 413. Other suitable input devices 
may be used in addition to, or in place of, the mouse 411 and keyboard 410. 
I/O (input/output) unit 419 coupled to bi-directional system bus 418 
represents such I/O elements as a printer, A/V (audio/video) I/O, etc. 

Computer 400 includes a video memory 414, main memory 415 and 
mass storage 412, all coupled to bi-directional system bus 418 along with 
keyboard 410, mouse 411 and processor 413. The mass storage 412 may 
include both fixed and removable media, such as magnetic, optical or 
magnetic optical storage systems or any other available mass storage 
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technology. Bus 418 may contain, for example, thirty-two address lines for 
addressing video memory 414 or main memory 415. The system bus 418 also 
includes, for example, a 32-bit data bus for transferring data between and 
among the components, such as processor 413, main memory 415, video 
5 memory 414 and mass storage 412. Alternatively, multiplex data /address 
lines may be used instead of separate data and address lines. 

In one embodiment of the invention, the processor 413 is a 
microprocessor manufactured by Motorola, such as the 680X0 processor or a 
microprocessor manufactured by Intel, such as the 80X86, or Pentium 
processor, or a SPARC microprocessor from Sun Microsystems, Inc. 
However, any other suitable microprocessor or microcomputer may be 
utilized. Main memory 415 is comprised of dynamic random access memory 
(DRAM). Video memory 414 is a dual-ported video random access memory. 
One port of the video memory 414 is coupled to video amplifier 416. The 
video amplifier 416 is used to drive the cathode ray tube (CRT) raster monitor 
417. Video amplifier 416 is well known in the art and may be implemented 
by any suitable apparatus. This circuitry converts pixel data stored in video 
memory 414 to a raster signal suitable for use by monitor 417. Monitor 417 is 
a type of monitor suitable for displaying graphic images. 

Computer 400 may also include a communication interface 420 
coupled to bus 418. Communication interface 420 provides a two-way data 
communication coupling via a network link 421 to a local network 422. For 
25 example, if communication interface 420 is an integrated services digital 
network (ISDN) card or a modem, communication interface 420 provides a 
data communication connection to the corresponding type of telephone line, 
which comprises part of network link 421 . If communication interface 420 is 
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The received code may be executed by processor 413 as it is received, 
and/or stored in mass storage 412, or other non-volatile storage for later 
execution. In this manner, computer 400 may obtain application code in the 
form of a carrier wave. 

5 

Application code may be embodied in any form of computer program 
product. A computer program product comprises a medium configured to 
store or transport computer readable code, or in which computer readable 
code may be embedded. Some examples of computer program products are 
10 CD-ROM disks, ROM cards, floppy disks, magnetic tapes, computer hard 
drives, servers on a network, and carrier waves. 



The computer systems described above are for purposes of example 
only. An embodiment of the invention may be implemented in any type of 
15 computer system or programming or processing environment. 



Embodiment of Software Apparatus for Encoding Content Characteristics 

20 In one or more embodiments of the invention, a variety of facts or 

hints are encoded into URLs to enable client side optimizations as well as 
optimizations at intermediate processing points, such as proxy caches or 
application layer firewalls. The facts or hints are characteristics of the data 
that is to be transferred that allow for optimization in the retrieval of the 

25 information from the server thereby influencing the transfer of information 
across a network. Further, the facts or hints may be encoded into any part of a 
URL. 
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Figure 5 demonstrates a method for the encoding and use of 
information in URLs according to one or more embodiments of the 
invention. A server has knowledge about the details and characteristics of 
the information and files the server maintains (e.g., the length of the file, or 
the fact that the file is cacheable). Such information and characteristics are 
known by the server prior to a file being requested by a client. At step 500, the 
server or web server may publish one or more of the characteristics regarding 
the file or transfer of the file in the URL for the file. Such a publication may 
be a universal convention that applies across all clients and servers. 
Alternatively, the server may provide the encoded information to the clients 
in a manner that is similar to the way servers provide cookies (a small piece 
of information that can later be read back from a browser) to a client or 
browser. For example, when a server first responds to a client's request, the 
server may include an additional (and optional) field in the reply that 
15 informs the client about the various retrieval methods and how such 
information is encoded (referred to as retrieval identifier). When future 
information is encoded with a retrieval identifier, the client maintains the 
knowledge regarding the meaning of the retrieval identifier and how to 
retrieve the encoded information. For example, a single retrieval identifier 
20 statement would include the following information: 

<property><patternxapplicable host domainxlif etime> 

where, the property field may be "not-cacheable" or "TCP-only"; the 
25 pattern field may contain "*-nc-»; the applicable host domain field may 
be -.sun.com"; and the lifetime field is an optional field that informs the 
clients when this information expires. 
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At step 502, the client views and examines the published characteristics 
or transmitted information. At step 504, the client transmits the request to 
the server (using an application protocol, e.g., HTTP). The client request uses 
the desired retrieval method. The retrieval method may consist of the 
5 transfer protocol (i.e., TCP or UDP) or it may consist of caching information 
(i.e., whether the information being transmitted is cacheable or not). At step 
506, the server transmits the information to the client using the specified 
retrieval method. At step 508, the client receives the information and 
processes it as desired. The post-transfer processing may consist of initializing 
10 an application program to read or display the file. 

Step 500 of Figure 5 demonstrates the encoding aspects of the present 
invention. By encoding retrieval or file characteristics in the URL, the client 
may optimize the retrieval of the information. Some examples of the hints 
15 or characteristics that can be encoded are "Don't use UDP M , "Please use UDP", 
"Don't use TCP", "Please use TCP", "Do not cache", "Cache if possible", or 
"Don't retrieve file from proxy". These messages may be encoded in any 
portion of the URL. 



20 For example, referring to Figure 3, if the URL specifies that TCP is 

preferred, the client will not attempt file retrieval using UDP first (steps 302- 
304), saving on the wasted overhead of a failed UDP transfer attempt. Instead, 
the client will retrieve the information using TCP at step 306 without a failed 
transfer attempt, eliminating steps 302-304. 

25 

Referring to Figure 2, if the URL specifies that the information is not 
cacheable or is not in cache, the client web browser will go directly to the 
server to retrieve the information at step 212 instead of performing a cache 
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lookup on the web proxy first eliminating steps 204-210. By retrieving the 
information directly from the server, the latency for repeat requests is 
reduced, the traffic in the network is reduced, and the load on the proxies i 
reduced. 



Thus, the URL is encoded with characteristics that will optimize the 
retrieval of files. According to one or more embodiments, the protocol 
identifier can be supplemented with the desired characteristic /retrieval 
method. For example, to specify that UDP is the preferable transfer protocol, 
the URL may be "HTTP/UDP" instead of merely "HTTP", or to specify that 
the information is cacheable, the URL may be "HTTP/C". However, web 
browsers may not be configured to process arbitrary data such as a preferred 
retrieval method in the application protocol identifier portion of a URL. 
Thus, users of prior art web browsers will be unable to process any URLs that 
have such data in the application protocol identifier portion of the URL. The 
ability to use an old web browser with a new data format is referred to as 
backwards compatibility. 

To maintain backwards compatibility while providing the desired 
characteristics in the URL, one or more embodiments of the invention 
encode the characteristics in the Internet domain name or resource server 
location portion of the URL. For example, in one or more embodiments, the 
suffix of the resource server location portion may be supplemented with 
additional characters to indicate the retrieval method. In such an 
embodiment, the file suffix "htmlt" in the URL 

"http://www.sunlabs.com/research/hsn/index.htmlt" may indicate that TCP 
is the preferable transfer protocol to utilize for the file transfer. Similarly, the 
file suffix "htmlu" in the URL 
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,, http://www.sunlabsxom/research/hsn/index.htmlu" may indicate that 
UDP is the preferable transfer protocol to utilize for the file transfer. 
Alternatively, the suffix of the URL could also be supplemented to indicate 
whether the information is cacheable or is maintained in the proxy cache. 
5 For example, the file suffix "htmlunc" in the URL 

"http://www.sunlabs.com/resea^ch/hsn/index.htmlunc ,, may indicate that 
UDP is the preferable transfer protocol and the information is not cacheable. 
Any other interesting information may be encoded as well, such as the 
lifetime (for how long it is cacheable). 

10 

One or more embodiments of the invention may provide for the 
retrieval characteristics to be encoded in the domain name or resource server 
location portion or the URL and hide that information from display in the 
web browser. Such information may be passed in the form of a parameter in 
15 HTML. For example, the file suffix "html?u" in the URL 

"http://www.sunlabs.com/research/hsn/index.html?u n passes the parameter 
"u" indicating a UDP transfer. 

By encoding the information in the file information or resource server 
20 location portion of a URL, with several pages based in the same server, some 
pages may be retrieved in one manner (e.g., from the proxy server or using 
UDP), while other pages may be retrieved in another manner (e.g., directly 
from the server or using TCP). Further, by encoding information in the URL 
prior to retrieving the information, the actual transmission of the file or 
25 information to the client may be optimized. Once the file is retrieved in an 
optimized manner by the client, it can then be processed according to the 
methods of the prior art (i.e., by invoking an image viewer, word processor, 
or HTML document browser to process the retrieved information). 

BNSDOCID: <WO 0013107A2.I_> 



WO 00/13107 

PCT/US99/18990 

22 



Thus, a method and apparatus for encoding content characteristics for 
the retrieval of information is described in conjunction with one or more 
specific embodiments. The invention is defined by the claims and their full 
5 scope of equivalents. 
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CLAIMS 



1. A method for retrieving information comprising: 
obtaining one or more retrieval characteristics prior to retrieval of said 
5 information; and 

retrieving said information based on one or more of said retrieval 
characteristics. 



2. The method of claim 1 wherein one or more of said retrieval 
10 characteristics consists of an indicator to utilize TCP as a transfer protocol. 

3. The method of claim 1 wherein one or more of said retrieved 
characteristics consists of an indicator to utilize UDP as a transfer protocol. 

15 4. The method of claim 1 wherein one or more of said retrieval 

characteristics consists of an indicator regarding whether said information 
should be obtained directly from the server. 

5. The method of claim 1 wherein said retrieval characteristics are 
20 encoded in a URL. 

6. The method of claim 5 wherein said retrieval characteristics are 
encoded in a file location portion of said URL. 

25 7. The method of claim 5 wherein said retrieval characteristics are 

encoded in a domain name portion of said URL. 
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8. The method of claim 1 wherein said information is retrieved 
from a server by a client on a computer network. 

9. A system comprising 
a processor; 

a memory coupled to said processor- 
code executed by said processor configured to retrieve information; 
said code comprising: 

a method obtaining one or more retrieval characteristics prior 
to retrieval of said information; and 

a method retrieving said information based on one or more of 
said retrieval characteristics. 



10. The system of claim 9 wherein one or more of said retrieval 
15 characteristics consists of an indicator to utilize TCP as a transfer protocol. 

11. The system of claim 9 wherein one or more of said retrieval 
characteristics consists of an indicator to utilize UDP as a transfer protocol. 

20 12. The system of claim 9 wherein one or more of said retrieval 

characteristics consists of an indicator regarding whether said information 
should be obtained directly from the server. 

13. The system of claim 9 wherein said retrieval characteristics are 
25 encoded in a URL. 

14. The system of claim 13 wherein said retrieval characteristics are 
encoded in a file location portion of said URL. 
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15. The system of claim 13 wherein said retrieval characteristics are 
encoded in a domain name portion of said URL. 

5 16. The system of claim 9 wherein said code is executed on a 

computer network and said information is retrieved from a server by a client. 

17. A computer program product comprising 

a computer usable medium having computer readable program code 
10 embodied therein configured to retrieve information, said computer program 
product comprising: 

computer readable code configured to cause a computer to obtain one 
or more retrieval characteristics prior to retrieval of said information; and 

computer readable code configured to cause a computer to retrieve said 
15 information based on one or more of said retrieval characteristics.. 



18. The computer program product of claim 17 wherein one or 
more of said retrieval characteristics consists of an indicator to utilize TCP as 
a transfer protocol. 

20 

19. The computer program product of claim 17 wherein one or 
more of said retrieval characteristics consists of an indicator to utilize UDP as 
a transfer protocol. 

25 20. The computer program product of claim 17 wherein one or 

more of said retrieval characteristics consists of an indicator regarding 
whether said information should be obtained directly from the server. 
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21. The computer program product of claim 17 wherein said 
retrieval characteristics are encoded in a URL. 

22. The computer program product of claim 21 wherein said 

5 retrieval characteristics are encoded in a file location portion of said URL. 

23. The computer program product of claim 21 wherein said 
retrieval characteristics are encoded in a domain name portion of said URL. 

10 24 - The computer program product of claim 17 wherein said 

information is retrieved from a server by a client on a computer network. 

25. The method of claim 1 wherein said method for obtaining one 
or more retrieval characteristics comprises publishing one or more retrieval 

15 characteristics prior to retrieval of said information. 

26. The method of claim 1 wherein said method for obtaining one 
or more retrieval characteristics comprises: 

transmitting encoding information, said encoding information 
20 defining encoded information regarding one or more retrieval characteristics; 
transmitting one or more retrieval characteristics in the form of 
encoding information for said information to be retrieved prior to retrieval 
of said information. 

*5 27. The system of claim 9 wherein said code for a method obtaining 

one or more retrieval characteristics comprises computer readable program 
code configured to cause a computer to publish one or more retrieval 
characteristics prior to retrieval of said information. 
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28. The system of claim 9 wherein said code for a method obtaining 
one or more retrieval characteristics comprises: 

a method transmitting encoding information, said encoding 
5 information defining encoded information regarding one or more retrieval 
characteristics; 

a method transmitting one or more retrieval characteristics in the 
form of encoding information for said information to be retrieved prior to 
retrieval of said information. 

10 

29. The computer program product of claim 17 wherein said 
computer readable program code to obtain one or more retrieval 
characteristics comprises computer readable program code configured to cause 
a computer to publish one or more retrieval characteristics prior to retrieval 

15 of said information. 

30. The computer program product of claim 17 wherein said 
computer readable program code to obtain one or more retrieval 
characteristics comprises: 

20 computer readable program code configured to cause a computer to 

transmit encoding information, said encoding information defining encoded 
information regarding one or more retrieval characteristics; 

computer readable program code configured to cause a computer to 
transmit one or more retrieval characteristics in the form of encoding 

25 information for said information to be retrieved prior to retrieval of said 
information. 
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METHOD AND APPARATUS FOR ENCODING CONTENT 

CHARACTERISTICS 

BACKGROUND OF THE INVENTION 

5 

1. FTFLD OF THE INVENTION 

This invention relates to the field of computer software, and, more 
specifically, to optimizing network traffic. 

10 

Portions of the disclosure of this patent document contain material 
that is subject to copyright protection. The copyright owner has no objection 
to the facsimile reproduction by anyone of the patent document or the patent 
disclosure as it appears in the Patent and Trademark Office file or records, but 
15 otherwise reserves all copyright rights whatsoever. Sun, Sun Microsystems, 
the Sun logo, Solaris, Java, JavaOS, JavaStation, Hotjava Views and all Java- 
based trademarks and logos are trademarks or registered trademarks of Sim 
Microsystems, Inc. in the United States and other countries. 

20 

2. BACKGROUND ART 

In a computer network environment, a computer user (client) may try 
to obtain a file from a central storage location (server). Existing schemes can 
25 waste time looking for the file and can often use an inefficient delivery 

method to provide the file to the user. These problems can be understood by 
reviewing networks and how they work. 
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A. Networks 



In modern computing environments, it is commonplace to employ 
multiple computers or workstations linked together in a network to 
5 communicate between, and share data with, network users. A network also 
may include resources, such as printers, modems, file servers, etc., and may 
also include services, such as electronic mail. Transferring information 
across a network may be a time consuming process. The prior art does not 
provide an efficient manner to optimize the transfer and retrieval of 
10 information on a network. 



A network can be a small system that is physically connected by cables 
or via wireless communication (a local area network or "LAN"), or several 
separate networks can be connected together to form a larger network (a wide 
area network or "WAN"). Other types of networks include the internet, tel- 
com networks, the World Wide Web, intranets, extranets, wireless networks, 
and other networks over which electronic, digital, and/or analog data may be 
communicated. 



The Internet is a client/server system. A "client" is the computer that 
you use to access the Internet. When you log onto the World Wide Web 
portion of the Internet, you view "web pages" that are stored on a remote 
"server" computer. Information including data, files, and the web pages to be 
viewed are often transferred between the client and the server. Depending 
on the type of information transferred, the server or client may have to 
evaluate the information prior to processing. Additionally, one method for 
transferring the data may be more efficient than another method depending 
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on the type of data being transferred. Some background on the Internet helps 
provide an understanding of these problems. 

The Internet is a worldwide network of interconnected computers. An 
5 Internet client accesses a computer on the network via an Internet provider. 
An Internet provider is an organization that provides a client (e.g., an 
individual or other organization) with access to the Internet (via analog 
telephone line or Integrated Services Digital Network line, for example). A 
client can, for example, download a file from or send an electronic mail 
10 message to another computer/client using the Internet. An Intranet is an 
internal corporate or organizational network that uses many of the same 
communications protocols as the Internet. The terms Internet, World Wide 
Web (WWW), and Web as used herein includes the Intranet as well as the 
Internet. 

15 

The components of the WWW include browser software, network 
links, and servers. The browser software, or browser, is a user-friendly 
interface (i.e., front-end) that simplifies access to the Internet. A browser 
allows a client to communicate a request without having to learn a 
20 complicated command syntax, for example. A browser typically provides a 

graphical user interface (GUI) for displaying information and receiving input. 
Examples of browsers currently available include Netscape Navigator and 
Internet Explorer. 

25 A browser displays information to a client or user as pages or 

documents. A language called Hypertext Markup Language (HTML) is used 
to define the format for a page to be displayed in the browser. A Web page is 
transmitted to a client as an HTML document. The browser executing at the 
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client parses the document and produces and displays a Web Page based on 
the information in the HTML document. Consequently, the HTML 
document defines the Web Page that is rendered at runtime on the browser. 



5 

B. Network Communication/Data Transfer 

Information servers maintain the information on the WWW and are 
capable of processing a client request. To enable the computers on a network 

10 including the WWW to communicate with each other, a set of standardized 
rules for exchanging the information between the computers, referred to as a 
"protocol", is utilized. Transfer Protocols generally specify the data format, 
timing, sequencing, and error checking of data transmissions. Numerous 
transfer protocols are used in the networking environment. For example, 

15 one family of transfer protocols is referred to as the transmission control 
protocol/internet protocol ("TCP/IP"). The TCP/IP family of transfer 
protocols is the set of transfer protocols used on the internet and on many 
multiplatform networks. 

20 

1. Transfer Protocols 

The TCP/IP transfer protocol family is made up of numerous 
individual protocols (e.g., file transfer protocol ("FTP"), transmission control 
25 protocol ("TCP"), and network terminal protocol ("TELNET")). The TCP 
protocol is responsible for breaking up a message to be transmitted into 
datagrams of manageable size, reassembling the datagrams at the receiving 
end, resending any datagrams that get lost (or are not transferred), and 
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reordering the data (from the datagrams) in the appropriate order. A 
datagram is a unit of data or information (also referred to as a packet) that is 
transferred or passed across the internet. A datagram contains a source and 
destination address along with the data. The TCP transfer protocol is often 
5 utilized to transmit large amounts of information because of its ability to 
break up the information into datagrams and reassemble the information at 
the receiving end. 

Another transfer protocol that is utilized to control the transfer of 
10 information is the user datagram protocol ("UDP"). UDP is designed for 
applications and data transmissions where sequences of datagrams do not 
need to be reassembled at the receiving end. UDP does not keep track of what 
has been transmitted in order to resend a datagram if necessary. Additionally, 
UDP's header information (information regarding the source and destination 
15 and other relevant information) is shorter than the header information 
utilized in TCP. 



2. Application Protocols 

20 

To utilize a Transfer Protocol to transfer information, an 
Application Protocol that defines a set of commands which one machine 
sends to another is utilized (e.g., commands to specify who the sender of the 
message is, who it is being sent to, and the text of the message). The Transfer 
25 Protocol (e.g., TCP or UDP) is utilized to ensure that the Application Protocol 
commands are completely transmitted to the receiving end. HyperText 
Transfer Protocol (HTTP) is the standard application protocol for 
communication with an information server on the WWW. HTTP has 
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maintaining of information and fetched documents by the proxy 102 is 
referred to as caching and the information maintained in the proxy 102 is 
referred to as a cache or proxy cache. 

5 A proxy 102 may be viewed as an intermediary between the server 104 

and client 100. Referring to Figure 1 and Figure 2, at step 202, the client 100 
requests information. At step 204, the request is forwarded to the proxy 102. 
At step 206, the client 100 first checks the proxy cache to see if the relevant 
information is maintained by the proxy 102. If the proxy cache contains the 

10 information, the client 100 does not need to contact the server 104 and loads 
the information directly from the proxy 102 at step 208. Alternatively, if the 
proxy cache does not contain the relevant information, the request is 
forwarded to the server 104 at step 210. At step 212, the client 100 retrieves the 
information from the server 104. When http is the protocol that is being 

15 transmitted over the internet protocol, the proxy 102 is referred to as a web 
proxy. 

To protect information in internal computer networks from external 
access, a firewall is utilized. A firewall is a mechanism that blocks access 

20 between the client and the server. To provide limited access to information, 
a proxy or proxy server may sit atop a firewall and act as a conduit, providing 
a specific connection for each network connection. Proxy software retains the 
ability to communicate with external sources, yet is trusted to communicate 
with the internal network. For example, proxy software may require a 

25 username and password to access certain sections of the internal network and 
completely block other sections from any external access. 
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C Addressing Scheme and Client/Server Data Retrieval 

An addressing scheme is employed to identify Internet resources (e.g., 
HTTP server, file or program). This addressing scheme is called Uniform 
5 Resource Locator (URL). A URL may contain the application protocol to use 
when accessing the server (e.g., HTTP), the Internet domain name (also 
referred to as the server host name) of the site on which the server is 
running, the port number of the server (the port number may not be 
specified in the URL but obtained by translating the server host name), and 
10 the location of the resource in the file structure of the server. For example, 
the URL n http://www.sunlabs.com/research/hsn/index.html M specifies the 
application protocol ("http"), the server host name ("www.sunlabs.com"), 
and the filename to be retrieved (Vresearch/hsn/index.htmT). 

15 If the client request is for a file, the HTTP server locates the file and 

sends it to the client. An HTTP server also has the ability to delegate work to 
Common Gateway Interface (CGI) programs. The CGI specification defines 
the mechanisms by which HTTP servers communicate with gateway 
programs. A gateway program is referenced using a URL. The HTTP server 

20 activates the program specified in the URL and uses CGI mechanisms to pass 
program data sent by the client to the gateway program. Data is passed from 
the server to the gateway program via command-line arguments, standard 
input, or environment variables. The gateway program processes the data, 
generates an HTML document, and returns the HTML document as its 

25 response to the server using CGI (via standard input, for example). The 
server forwards the HTML document to the client using the HTTP. 
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Once files have been retrieved, the client may utilize or process the file. 
For example, if a HTML document is retrieved, a client's web browser may 
parse the HTML document and display the document. Depending on the 
type of file retrieved, the client may activate an application to process the file. 
5 For example, if a word processing document is retrieved, the client may 

activate a word processor to process the document. Alternatively, if an image 
file is retrieved, an image viewer may be activated to process and display the 
image. 

To identify the type of file that is retrieved so that the client may know 
how to process the file subsequent to retrieval, a file suffix or extension may 
be utilized. A file extension or suffix often consists of a period "." and several 
letters that are attached to the end of a file name. For example, an HTML 
document may end with the suffix ".htm" or ".html" (e.g., "index.htmF or 
"home.html"), a word processing document filename may end with the 
suffix ".doc" (e.g., "report.doc" or "letter.doc"), a JPEG (Joint Photographic 
Experts Group) image filename may end with the suffix ".jpg M (e.g., 
"image.jpg" or "picture .jpg"), and a postscript document (document created 
in the postscript page description language) may end with the suffix ".ps" 
(e.g., "calendar.ps" or "font.ps"). 

Upon receiving a file, the client browser will typically examine the 
extension to determine how to process the file after receipt (e.g., launch an 
application program to process the file). 
25 

The above described methods are slow and inefficient in retrieving a 
file from the client or server. For example, with a web proxy, the client first 
searches the web proxy cache for the relevant web pages or information and if 
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10 



not present then processes the request with the server requiring a second 
search. Additionally, although it may be more efficient to utilize a UDP 
transfer protocol instead of TCP (i.e., for smkler files), this is not done. 
Referring to Figure 3, the prior art first attempts to transfer information using 
UDP at step 302. At step 304, a determination is made regarding whether UDP 
was acceptable and was utilized to transmit the requested information. If the 
transfer attempt failed and UDP was not acceptable, the web browser will 
attempt the transfer using TCP at step 306. (Repetitively attempting a transfer 
with an improper transfer protocol creates a large overhead. Further, the 
prior art does not provide for an efficient method to optimize the retrieval of 
a file or information from a server or client). If the transfer using UDP is 
acceptable, the transfer proceeds using UDP at step 309. 
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SUMMARY OF THE INVENTION 

A method and apparatus for encoding characteristics for the retrieval 
of information. In a network of computers, clients and servers communicate 
5 and exchange data and information. Depending on characteristics of the data 
or information, certain methods for exchanging or retrieving the 
information may be preferred. For example, if the information is too large to 
utilize a UDP (User Datagram Protocol) transfer protocol, then the TCP 
(Transmission Control Protocol) transfer protocol may be preferred. In 
10 addition, if the information is not cacheable, then it is preferable to retrieve 
the information directly from the server instead of searching the cache first. 

A URL (Uniform Resource Locator) is utilized on the internet to 
specify the application protocol (e.g., http), the domain name (e.g., 
15 www.sim.com), and file location (e.g., /users /hen/ index.html). The suffix of 
a file indicator is utilized to identify how to process the data or information 
subsequent to retrieval. 

One or more embodiments of the invention provide for encoding 
20 characteristics of data to be transferred that indicates or hints at an optimal 
method to retrieve the data. For example, the URL may specify that TCP is 
the preferred transfer protocol, thereby avoiding an attempted transfer using 
UDP. Additionally, the encoding may specify that the data is not cacheable so 
that the client may retrieve the information directly from the server instead 
25 of searching the proxy cache. This encoded information may be published by 
the server and parsed by the client prior to executing data retrieval. Thus, the 
client may retrieve the data efficiently, decreasing the overhead utilized 
influencing the behavior of the data transfer over a network. 
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The characteristics or preferred retrieval method may be encoded in a 
any portion of a URL. Additionally, one or more embodiments of the 
invention provide for backwards compatibility with existing internet 
5 browsers by encoding the characteristics in the file location portion of the 
URL instead of the application protocol identifier portion. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the client, server, and proxy relationship. 

5 Figure 2 demonstrates a prior art method for retrieving information 

with a proxy. 

Figure 3 demonstrates a prior art method for retrieving information 
using TCP or UDP. 

10 

Figure 4 is a block diagram of one embodiment of a computer system 
capable of providing a suitable execution environment for one or more 
embodiments of the invention. 



15 Figure 5 demonstrates a method for the encoding and use of 

information in URLs according to one or more embodiments of the 
invention. 
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DETAILFD DF^Cl? tptt q^j q F T RF TNjv FMTTnM 

The invention is a method and apparatus for encoding content 
characteristics. In the following description, numerous specific details are set 
forth to provide a more thorough description of embodiments of the 
invention. It is apparent, however, to one skilled in the art, that the 
invention may be practiced without these specific details. In other instances, 
well known features have not been described in detail so as not to obscure the 
invention. 

Embodiment of Computer ExgcuJioj] P n vironTr,Pnf (Har&varej 

An embodiment of the invention can be implemented as computer 
software in the form of computer readable code executed on a general 
purpose computer such as computer 400 illustrated in Figure 4, or in the form 
of bytecode class files running on such a computer. A keyboard 410 and 
mouse 411 are coupled to a bi-directional system bus 418. The keyboard and 
mouse are for introducing user input to the computer system and 
communicating that user input to processor 413. Other suitable input devices 
may be used in addition to, or in place of, the mouse 411 and keyboard 410. 
I/O (input/output) unit 419 coupled to bi-directional system bus 418 
represents such I/O elements as a printer, A/V (audio/video) I/O, etc. 

Computer 400 includes a video memory 414, main memory 415 and 
mass storage 412, all coupled to bi-directional system bus 418 along with 
keyboard 410, mouse 411 and processor 413. The mass storage 412 may 
include both fixed and removable media, such as magnetic, optical or 
magnetic optical storage systems or any other available mass storage 
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technology. Bus 418 may contain, for example, thirty-two address lines for 
addressing video memory 414 or main memory 415. The system bus 418 also 
includes, for example, a 32-bit data bus for transferring data between and 
among the components, such as processor 413, main memory 415, video 
5 memory 414 and mass storage 412. Alternatively, multiplex data/address 
lines may be used instead of separate data and address lines. 

In one embodiment of the invention, the processor 413 is a 
microprocessor manufactured by Motorola, such as the 680X0 processor or a 
microprocessor manufactured by Intel, such as the 80X86, or Pentium 
processor, or a SPARC microprocessor from Sun Microsystems, Inc. 
However, any other suitable microprocessor or microcomputer may be 
utilized. Main memory 415 is comprised of dynamic random access memory 
(DRAM). Video memory 414 is a dual-ported video random access memory. 
One port of the video memory 414 is coupled to video amplifier 416. The 
video amplifier 416 is used to drive the cathode ray tube (CRT) raster monitor 
417. Video amplifier 416 is well known in the art and may be implemented 
by any suitable apparatus. This circuitry converts pixel data stored in video 
memory 414 to a raster signal suitable for use by monitor 417. Monitor 417 is 
a type of monitor suitable for displaying graphic images. 

Computer 400 may also include a communication interface 420 
coupled to bus 418. Communication interface 420 provides a two-way data 
communication coupling via a network link 421 to a local network 422. For 
25 example, if communication interface 420 is an integrated services digital 

network (ISDN) card or a modem, communication interface 420 provides a 
data communication connection to the corresponding type of telephone line, 
which comprises part of network link 421. If communication interface 420 is 
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a local area network (LAN) card, communication interface 420 provides a data 
communication connection via network link 421 to a compatible LAN. 
Wireless; links are also possible. In any such implementation, 
communication interface 420 sends and receives electrical, electromagnetic or 
5 optical signals which carry digital data streams representing various types of 
information. 

Network link 421 typically provides data communication through one 
or more networks to other data devices. For example, network link 421 may 

10 provide a connection through local network 422 to local server computer 423 
or to data equipment operated by an Internet Service Provider (ISP) 124. ISP 
424 in turn provides data communication services through the world wide 
packet data communication network now commonly referred to as the 
"Internet" 425. Local network 422 and Internet 425 both use electrical, 

15 electromagnetic or optical signals which carry digital data streams. The 
signals through the various networks and the signals on network link 421 
and through communication interface 420, which carry the digital data to and 
from computer 400, are exemplary forms of carrier waves transporting the 
information. 



Computer 400 can send messages and receive data, including program 
code, through the network(s) / network link 421, and communication 
interface 420. In the Internet example, remote server computer 426 might 
transmit a requested code for an application program through Internet 425, 
ISP 424, local network 422 and communication interface 420. In accord with 
the invention, one such application is that of remotely configuring a 
computer. 
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The received code may be executed by processor 413 as it is received, 
and/or stored in mass storage 412, or other non-volatile storage for later 
execution. In this manner, computer 400 may obtain application code in the 
form of a carrier wave. 

5 

Application code may be embodied in any form of computer program 
product. A computer program product comprises a medium configured to 
store or transport computer readable code, or in which computer readable 
code may be embedded. Some examples of computer program products are 
10 CD-ROM disks, ROM cards, floppy disks, magnetic tapes, computer hard 
drives, servers on a network, and carrier waves. 



The computer systems described above are for purposes of example 
only. An embodiment of the invention may be implemented in any type of 
15 computer system or programming or processing environment. 



Embodiment of Software Apparatus for Encoding Content Characteristics 

20 In one or more embodiments of the invention, a variety of facts or 

hints are encoded into URLs to enable client side optimizations as well as 
optimizations at intermediate processing points, such as proxy caches or 
application layer firewalls. The facts or hints are characteristics of the data 
that is to be transferred that allow for optimization in the retrieval of the 

25 information from the server thereby influencing the transfer of information 
across a network. Further, the facts or hints may be encoded into any part of a 
URL. 
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Figure 5 demonstrates a method for the encoding and use of 
information in URLs according to one or more embodiments of the 
invention. A server has knowledge about the details and characteristics of 
the information and files the server maintains (e.g., the length of the file, or 
5 the fact that the file is cacheable). Such information and characteristics are 
known by the server prior to a file being requested by a client. At step 500, the 
server or web server may publish one or more of the characteristics regarding 
the file or transfer of the file in the URL for the file. Such a publication may 
be a universal convention that applies across all clients and servers. 
10 Alternatively, the server may provide the encoded information to the clients 
in a manner that is similar to the way servers provide cookies (a small piece 
of information that can later be read back from a browser) to a client or 
browser. For example, when a server first responds to a client's request, the 
server may include an additional (and optional) field in the reply that 
15 informs the client about the various retrieval methods and how such 

information is encoded (referred to as retrieval identifier). When future 
information is encoded with a retrieval identifier, the client maintains the 
knowledge regarding the meaning of the retrieval identifier and how to 
retrieve the encoded information. For example, a single retrieval identifier 
20 statement would include the following information: 

<propertyxpatternxapplicable host domainxlif etime> 

where, the property field may be "not-cacheable" or "TCP-only"; the 
25 pattern field may contain "*-nc-*; the applicable host domain field may 
be "♦.sun.com"; and the lifetime field is an optional field that informs the 
clients when this information expires. 
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At step 502, the client views and examines the published characteristics 
or transmitted information. At step 504, the client transmits the request to 
the server (using an application protocol, e.g., HTTP). The client request uses 
the desired retrieval method. The retrieval method may consist of the 
5 transfer protocol (i.e., TCP or UDP) or it may consist of caching information 
(i.e., whether the information being transmitted is cacheable or not). At step 
506, the server transmits the information to the client using the specified 
retrieval method. At step 508, the client receives the information and 
processes it as desired. The post-transfer processing may consist of initializing 
10 an application program to read or display the file. 

Step 500 of Figure 5 demonstrates the encoding aspects of the present 
invention. By encoding retrieval or file characteristics in the URL, the client 
may optimize the retrieval of the information. Some examples of the hints 
15 or characteristics that can be encoded are "Don't use UDP", "Please use UDP", 
"Don't use TCP", "Please use TCP", "Do not cache", "Cache if possible", or 
"Don't retrieve file from proxy". These messages may be encoded in any 
portion of the URL. 



20 For example, referring to Figure 3, if the URL specifies that TCP is 

preferred, the client will not attempt file retrieval using UDP first (steps 302- 
304), saving on the wasted overhead of a failed UDP transfer attempt. Instead, 
the client will retrieve the information using TCP at step 306 without a failed 
transfer attempt, eliminating steps 302-304. 

25 

Referring to Figure 2, if the URL specifies that the information is not 
cacheable or is not in cache, the client web browser will go directly to the 
server to retrieve the information at step 212 instead of performing a cache 
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lookup on the web proxy first eliminating steps 204-210. By retrieving the 
information directly from the server, the latency for repeat requests is 
reduced, the traffic in the network is reduced, and the load on the proxies is 
reduced. 

5 

Thus, the URL is encoded with characteristics that will optimize the 
retrieval of files. According to one or more embodiments, the protocol 
identifier can be supplemented with the desired characteristic/retrieval 
method. For example, to specify that UDP is the preferable transfer protocol, 

10 the URL may be "HTTP/LTDF' instead of merely "HTTP", or to specify that 
the information is cacheable, the URL may be "HTTP/C". However, web 
browsers may not be configured to process arbitrary data such as a preferred 
retrieval method in the application protocol identifier portion of a URL. 
Thus, users of prior art web browsers will be unable to process any URLs that 

15 have such data in the application protocol identifier portion of the URL. The 
ability to use an old web browser with a new data format is referred to as 
backwards compatibility. 

To maintain backwards compatibility while providing the desired 
20 characteristics in the URL, one or more embodiments of the invention 

encode the characteristics in the Internet domain name or resource server 
location portion of the URL. For example, in one or more embodiments, the 
suffix of the resource server location portion may be supplemented with 
additional characters to indicate the retrieval method. In such an 
25 embodiment, the file suffix "htmlt" in the URL 

,f http://www.surdabs.com/researc^/hsn/index.htmlt n may indicate that TCP 
is the preferable transfer protocol to utilize for the file transfer. Similarly, the 
file suffix "htmlu" in the URL 
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n http://www.sunlabsxom/research/hsn/index.htmlu n may indicate that 
UDP is the preferable transfer protocol to utilize for the file transfer. 
Alternatively, the suffix of the URL could also be supplemented to indicate 
whether the information is cacheable or is maintained in the proxy cache. 
5 For example, the file suffix "htmlunc" in the URL 

M http://www.sunlabs.com/research/hsn/index.htmlunc M may indicate that 
UDP is the preferable transfer protocol and the information is not cacheable. 
Any other interesting information may be encoded as well, such as the 
lifetime (for how long it is cacheable). 

10 

One or more embodiments of the invention may provide for the 
retrieval characteristics to be encoded in the domain name or resource server 
location portion or the URL and hide that information from display in the 
web browser. Such information may be passed in the form of a parameter in 
15 HTML. For example, the file suffix "html?u" in the URL 

"http://www.sunlabs.com/research/hsn/index.html7u" passes the parameter 
"u" indicating a UDP transfer. 

By encoding the information in the file information or resource server 
20 location portion of a URL, with several pages based in the same server, some 
pages may be retrieved in one manner (e.g., from the proxy server or using 
UDP), while other pages may be retrieved in another manner (e.g., directly 
from the server or using TCP). Further, by encoding information in the URL 
prior to retrieving the information, the actual transmission of the file or 
25 information to the client may be optimized. Once the file is retrieved in an 
optimized manner by the client, it can then be processed according to the 
methods of the prior art (i.e., by invoking an image viewer, word processor, 
or HTML document browser to process the retrieved information). 
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Thus, a method and apparatus for encoding content characteristics for 
the retrieval of information is described in conjunction with one or more 
specific embodiments. The invention is defined by the claims and their full 
5 scope of equivalents. 
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CLAIMS 

1. A method for retrieving information comprising: 

obtaining one or more retrieval characteristics prior to retrieval of said 
5 information; and 

retrieving said information based on one or more of said retrieval 
characteristics. 

2. The method of claim 1 wherein one or more of said retrieval 
10 characteristics consists of an indicator to utilize TCP as a transfer protocol. 

3. The method of claim 1 wherein one or more of said retrieval 
characteristics consists of an indicator to utilize UDP as a transfer protocol. 

15 4. The method of claim 1 wherein one or more of said retrieval 

characteristics consists of an indicator regarding whether said information 
should be obtained directly from the server. 

5. The method of claim 1 wherein said retrieval characteristics are 
20 encoded in a URL. 

6. The method of claim 5 wherein said retrieval characteristics are 
encoded in a file location portion of said URL. 

25 7. The method of claim 5 wherein said retrieval characteristics are 

encoded in a domain name portion of said URL. 
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8. The method of claim 1 wherein said information is retrieved 
from a server by a client on a computer network. 

9. A system comprising 
5 a processor; 

a memory coupled to said processor; 

code executed by said processor configured to retrieve information; 
said code comprising: 

a method obtaining one or more retrieval characteristics prior 
10 to retrieval of said information; and 

a method retrieving said information based on one or more of 
said retrieval characteristics. 

10. The system of claim 9 wherein one or more of said retrieval 
15 characteristics consists of an indicator to utilize TCP as a transfer protocol. 

11. The system of claim 9 wherein one or more of said retrieval 
characteristics consists of an indicator to utilize UDP as a transfer protocol. 

20 12. The system of claim 9 wherein one or more of said retrieval 

characteristics consists of an indicator regarding whether said information 
should be obtained directly from the server. 

13. The system of claim 9 wherein said retrieval characteristics are 
25 encoded in a URL. 

14. The system of claim 13 wherein said retrieval characteristics are 
encoded in a file location portion of said URL. 
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15. The system of claim 13 wherein said retrieval characteristics are 
encoded in a domain name portion of said URL. 

5 16. The system of claim 9 wherein said code is executed on a 

computer network and said information is retrieved from a server by a client. 

17. A computer program product comprising 

a computer usable medium having computer readable program code 
10 embodied therein configured to retrieve information, said computer program 
product comprising: 

computer readable code configured to cause a computer to obtain one 
or more retrieval characteristics prior to retrieval of said information; and 

computer readable code configured to cause a computer to retrieve said 
15 information based on one or more of said retrieval characteristics.. 



18. The computer program product of claim 17 wherein one or 
more of said retrieval characteristics consists of an indicator to utilize TCP as 
a transfer protocol. 

20 

19. The computer program product of claim 17 wherein one or 
more of said retrieval characteristics consists of an indicator to utilize UDP as 
a transfer protocol. 

25 20. The computer program product of claim 17 wherein one or 

more of said retrieval characteristics consists of an indicator regarding 
whether said information should be obtained directly from the server. 
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21. The computer program product of claim 17 wherein said 
retrieval characteristics are encoded in a URL. 

22. The computer program product of claim 21 wherein said 

5 retrieval characteristics are encoded in a file location portion of said URL. 

23. The computer program product of claim 21 wherein said 
retrieval characteristics are encoded in a domain name portion of said URL. 



10 



24. The computer program product of claim 17 wherein said 
information is retrieved from a server by a client on a computer network. 

25. The method of claim 1 wherein said method for obtaining one 
or more retrieval characteristics comprises publishing one or more retrieval 

15 characteristics prior to retrieval of said information. 

26. The method of claim 1 wherein said method for obtaining one 
or more retrieval characteristics comprises: 

transmitting encoding information, said encoding information 
denning encoded information regarding one or more retrieval characteristics; 

transmitting one or more retrieval characteristics in the form of 
encoding information for said information to be retrieved prior to retrieval 
of said information. 

25 27. The system of claim 9 wherein said code for a method obtaining 

one or more retrieval characteristics comprises computer readable program 
code configured to cause a computer to publish one or more retrieval 
characteristics prior to retrieval of said information. 



20 
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28. The system of claim 9 wherein said code for a method obtaining 
one or more retrieval characteristics comprises: 

a method transmitting encoding information, said encoding 
information defining encoded information regarding one or more retrieval 
characteristics; 

a method transmitting one or more retrieval characteristics in the 
form of encoding information for said information to be retrieved prior to 
retrieval of said information. 

29. The computer program product of claim 17 wherein said 
computer readable program code to obtain one or more retrieval 
characteristics comprises computer readable program code configured to cause 
a computer to publish one or more retrieval characteristics prior to retrieval 
of said information. 

30. The computer program product of claim 17 wherein said 
computer readable program code to obtain one or more retrieval 
characteristics comprises: 

computer readable program code configured to cause a computer to 
transmit encoding information, said encoding information defining encoded 
information regarding one or more retrieval characteristics; 

computer readable program code configured to cause a computer to 
transmit one or more retrieval characteristics in the form of encoding 
information for said information to be retrieved prior to retrieval of said 
information. 
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